ARCS: an aggregated related column scoring scheme for aligned sequences
نویسندگان
چکیده
MOTIVATION Biologists frequently align multiple biological sequences to determine consensus sequences and/or search for predominant residues and conserved regions. Particularly, determining conserved regions in an alignment is one of the most important activities. Since protein sequences are often several-hundred residues or longer, it is difficult to distinguish biologically important conserved regions (motifs or domains) from others. The widely used tools, Logos, Al2co, Confind, and the entropy-based method, often fail to highlight such regions. Thus a computational tool that can highlight biologically important regions accurately will be highly desired. RESULTS This paper presents a new scoring scheme ARCS (Aggregated Related Column Score) for aligned biological sequences. ARCS method considers not only the traditional character similarity measure but also column correlation. In an extensive experimental evaluation using 533 PROSITE patterns, ARCS is able to highlight the motif regions with up to 77.7% accuracy corresponding to the top three peaks. AVAILABILITY The source code is available on http://bio.informatics.indiana.edu/projects/arcs and http://goldengate.case.edu/projects/arcs
منابع مشابه
ARCS-Motif: discovering correlated motifs from unaligned biological sequences
MOTIVATION The goal of motif discovery is to detect novel, unknown, and important signals from biology sequences. In most models, the importance of a motif is equal to the sum of the similarity of every single position. In 2006, Song et al. introduced Aggregated Related Column Score (ARCS) measure which includes correlation information to the evaluation of motif importance. The paper showed tha...
متن کاملIdentifying DNA and protein patterns with statistically significant alignments of multiple sequences
MOTIVATION Molecular biologists frequently can obtain interesting insight by aligning a set of related DNA, RNA or protein sequences. Such alignments can be used to determine either evolutionary or functional relationships. Our interest is in identifying functional relationships. Unless the sequences are very similar, it is necessary to have a specific strategy for measuring-or scoring-the rela...
متن کاملAn assessment of substitution scores for protein profile-profile comparison
MOTIVATION Pairwise protein sequence alignments are generally evaluated using scores defined as the sum of substitution scores for aligning amino acids to one another, and gap scores for aligning runs of amino acids in one sequence to null characters inserted into the other. Protein profiles may be abstracted from multiple alignments of protein sequences, and substitution and gap scores have be...
متن کاملFrequency of gaps observed in a structurally aligned protein pair database suggests a simple gap penalty function.
Gap penalty is an important component of the scoring scheme that is needed when searching for homologous proteins and for accurate alignment of protein sequences. Most homology search and sequence alignment algorithms employ a heuristic 'affine gap penalty' scheme q + r x n, in which q is the penalty for opening a gap, r the penalty for extending it and n the gap length. In order to devise a mo...
متن کاملRecent Developments in Linear-Space Alignment Methods: A Survey
A dynamic-programming strategy for sequence alignment first proposed in 1975 by Dan Hirschberg can be adapted to yield a number of extremely space-efficient algorithms. Specifically, these algorithms align two sequences using only "linear space," i.e., an amount of computer memory that is proportional to the sum of the lengths of the two sequences being aligned. This paper begins by reviewing t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 22 19 شماره
صفحات -
تاریخ انتشار 2006